2020/08/20
Adapted and simplified from Judea Pearl's more technical and complete 2016 resolution of the paradox.
"A large university is interested in investigating the effects on the students of the diet provided in the university dining halls and any sex differences in these effects. Various types of data are gathered. In particular, the weight of each student at the time of his arrival in September and his weight the following June are recorded."" (Lord 1967, p. 304)
Lord posits two hypothetical statisticians who analyze the data
Who is right? How do you decide?
E.g. does poverty (X) cause cancer (Y)?
Maybe poverty (X) increases your exposure to environmental toxins (M), and it's that exposure that causes cancer (Y). I.e. Environmental toxins mediate the effect of poverty on cancer risk.
And maybe there are other ways in which poverty affects cancer risk that have nothing to do with environmental toxins.
We can think of the effect of X on Y in two ways:
We can think of the effect of X on Y in two ways:
NOT Lord's data
# Start with 1000 students f_0 = rnorm(500, 150, 5) m_0 = rnorm(500, 160, 5)
# Start with 1000 students
f_0 = rnorm(500, 150, 5)
m_0 = rnorm(500, 160, 5)
g_fun = function(w_0){
0.2 * w_0 + rnorm(length(w_0))
}
# Start with 1000 students
m_df = (data.frame(Sex = rep(c('F','M'), each = 500),
Initial = c(f_0, m_0))
%>% mutate(Gain = g_fun(Initial),
Final = Initial + Gain))
Statistician 1 does not control for initial weight:
\[ G \sim \beta_0 + \beta_1 * S \]
The \(S\) variable here incorporates the total effect of sex on weight gain
\(TE = b + ac - a\)
Intercept and sex are significant in this model
m_mod1 = lm(Gain ~ Sex, data = m_df) summary(m_mod1)
## ## Call: ## lm(formula = Gain ~ Sex, data = m_df) ## ## Residuals: ## Min 1Q Median 3Q Max ## -5.2663 -0.9626 -0.0038 1.0226 3.9809 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 30.04605 0.06432 467.1 <2e-16 *** ## SexM 1.86457 0.09097 20.5 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 1.438 on 998 degrees of freedom ## Multiple R-squared: 0.2963, Adjusted R-squared: 0.2956 ## F-statistic: 420.2 on 1 and 998 DF, p-value: < 2.2e-16
Intercept and sex are significant in this model
Controls for initial weight in their model
\[ G \sim \beta_0 + (\beta_1 * S) + (\beta_2 * W_I) \]
The sex variable here only includes the direct effect of sex: \(b\)
Intercept and sex are now null. The only variable that affects the outcome is initial weight
m_mod2 = lm(Gain ~ Sex + Initial, data = m_df) summary(m_mod2)
## ## Call: ## lm(formula = Gain ~ Sex + Initial, data = m_df) ## ## Residuals: ## Min 1Q Median 3Q Max ## -3.12409 -0.66387 0.00598 0.69323 3.15913 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -0.534847 0.907930 -0.589 0.556 ## SexM -0.095269 0.085131 -1.119 0.263 ## Initial 0.203625 0.006038 33.722 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.9836 on 997 degrees of freedom ## Multiple R-squared: 0.6712, Adjusted R-squared: 0.6706 ## F-statistic: 1018 on 2 and 997 DF, p-value: < 2.2e-16
Who is "correct?"
How does this happen?
How does this happen? Easy!
Choose the values correctly, and it all cancels.
Lord plotted final weight vs. initial weight
Look at gain vs. initial weight.
Lord's paradox has nothing to do with baseline values
Same thing we do every night, Pinky
Two stastisticians analyze these data. It was 1967, so the available tools were:
The first statistician compares the mean and variance of the initial vs. final weights in each of the two sexes:
Any individual gains or losses cancel each other out, and overall there is negligible change in student weight in either group.
l_a_anov = aov(Weight ~ Timepoint, filter(l_df_long, Sex == 'F')) Anova(l_a_anov)
## Anova Table (Type II tests) ## ## Response: Weight ## Sum Sq Df F value Pr(>F) ## Timepoint 0.1 1 0.004 0.9496 ## Residuals 21128.4 998
l_b_anov = aov(Weight ~ Timepoint, filter(l_df_long, Sex == 'M')) Anova(l_b_anov)
## Anova Table (Type II tests) ## ## Response: Weight ## Sum Sq Df F value Pr(>F) ## Timepoint 3.6 1 0.1728 0.6777 ## Residuals 20813.1 998
We can do an equivalent thing with a linear model:
summary(l_mod1)
## ## Call: ## lm(formula = Gain ~ Sex, data = l_df) ## ## Residuals: ## Min 1Q Median 3Q Max ## -6.9143 -1.4764 -0.0168 1.5077 6.5446 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 0.01839 0.10195 0.180 0.857 ## SexM 0.10168 0.14418 0.705 0.481 ## ## Residual standard error: 2.28 on 998 degrees of freedom ## Multiple R-squared: 0.0004981, Adjusted R-squared: -0.0005034 ## F-statistic: 0.4973 on 1 and 998 DF, p-value: 0.4808
The second statistician decides to control for the initial weight of the students.
l_ancov = aov(Gain ~ Sex + Initial, data = l_df) Anova(l_ancov)
## Anova Table (Type II tests) ## ## Response: Gain ## Sum Sq Df F value Pr(>F) ## Sex 620.6 1 151.52 < 2.2e-16 *** ## Initial 1102.9 1 269.26 < 2.2e-16 *** ## Residuals 4083.9 997 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
l_mod2 = lm(Gain ~ Sex + Initial, data = l_df) summary(l_mod2)
## ## Call: ## lm(formula = Gain ~ Sex + Initial, data = l_df) ## ## Residuals: ## Min 1Q Median 3Q Max ## -6.9913 -1.3603 0.0365 1.3351 6.7667 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 32.50732 1.98201 16.40 <2e-16 *** ## SexM 2.25827 0.18346 12.31 <2e-16 *** ## Initial -0.21680 0.01321 -16.41 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 2.024 on 997 degrees of freedom ## Multiple R-squared: 0.213, Adjusted R-squared: 0.2115 ## F-statistic: 134.9 on 2 and 997 DF, p-value: < 2.2e-16
The overall means and distributions may not have changed in either sex, but for any given starting weight, male students gained more than female students.
Who is right? How do we decide?